18.415 Project - Using a Count-Min sketch structure for easier analysis of Bloom filter false positive rates

نویسنده

Yun William Yu

چکیده

The classical proof for Bloom filter false positive rates was shown to be incorrect due to a subtle error regarding independence of events. Although the previously computed false positive rate is asymptotically still correct, it is incorrect for small parameter values and is generally only a lower bound [BGK08]. Indeed, the correct analysis for Bloom filters does not admit a convenient closed form, though efficient computations do exist [CRJ10]. In this paper, we outline the strategy of the new analysis Furthermore, we demonstrate that for large parameter values, we can use a Count-Min [CM05] data structure to asymptotically recover a classical Bloom filter false positive rate using only the naive analysis scheme. This has no practical applications, as the Bloom filter is strictly better in every sense, but this construction may be useful pedagogically in demonstrating the subtleties involved when working with independence. Lastly, exact expected false positive rates are trivially computable for this structure; though the rates are worse than for regular Bloom filters, this might somehow be useful theoretically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reducing False Positives of a Bloom Filter using Cross-Checking Bloom Filters

A Bloom filter is a compact data structure that supports membership queries on a set, allowing false positives. The simplicity and the excellent performance of a Bloom filter make it a standard data structure of great use in many network applications. In reducing the false positive rate of a Bloom filter, it is well known that the size of a Bloom filter and accordingly the number of hash indice...

متن کامل

A Cuckoo Filter Modification Inspired by Bloom Filter

Probabilistic data structures are so popular in membership queries, network applications, and so on. Bloom Filter and Cuckoo Filter are two popular space efficient models that incorporate in set membership checking part of many important protocols. They are compact representation of data that use hash functions to randomize a set of items. Being able to store more elements while keeping a reaso...

متن کامل

Optimized hash for network path encoding with minimized false positives

The Bloom filter is a space efficient randomized data structure for representing a set and supporting membership queries. Bloom filters intrinsically allow false positives. However, the space savings they offer outweigh the disadvantage if the false positive rates are kept sufficiently low. Inspired by the recent application of the Bloom filter in a novel multicast forwarding fabric, this paper...

متن کامل

Accurate Per-Flow Measurement with Bloom Sketch

Sketch is a probabilistic data structure, and is widely used for per-flow measurement in network. The most common sketches are the CM sketch and its several variants. However, given a limited memory size, these sketches always significantly overestimate some flows, exhibiting poor accuracy. To address this issue, we proposed a novel sketch named the Bloom sketch, combining the sketch with the B...

متن کامل

Stream Clustering using Probabilistic Data Structures

Most density based stream clustering algorithms separate the clustering process into an online and offline component. Exact summarized statistics are being employed for defining micro-clusters or grid cells during the online stage followed by macro-clustering during the offline stage. This paper proposes a novel alternative to the traditional two phase stream clustering scheme, introducing sket...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

18.415 Project - Using a Count-Min sketch structure for easier analysis of Bloom filter false positive rates

نویسنده

چکیده

منابع مشابه

Reducing False Positives of a Bloom Filter using Cross-Checking Bloom Filters

A Cuckoo Filter Modification Inspired by Bloom Filter

Optimized hash for network path encoding with minimized false positives

Accurate Per-Flow Measurement with Bloom Sketch

Stream Clustering using Probabilistic Data Structures

عنوان ژورنال:

اشتراک گذاری